amazon-redshift-python-driverを使ってPythonからRedshiftに接続する
データアナリティクス事業本部の森脇です。
本日AWSから、PythonのRedshiftドライバーに関する情報 がアナウンスされました。
元々クローズドで開発されていたドライバーがOSS化され、amazon-redshift-python-driver という名称で一般公開されました。
以下の特徴をそなえています。
- Apache 2.0 License
- Pure Pythonライブラリ
- Database API 2.0 準拠
- pandas, numpyと簡単に連携可能
今まではpsycopg2を使うことが一般的でしたが、今後はこちらのドライバーを使うのが良さそうですね。
早速試してみました。
(pythonのバージョンは3.8.3で試しています)
インストール
pipを使ってインストールできます。
[tmp]$ python --version Python 3.8.3 [tmp]$ rm -rf rc-test/ [tmp]$ python -m venv rc-test [tmp]$ source rc-test/bin/activate (rc-test) [tmp]$ pip install redshift_connector Collecting redshift_connector Using cached https://files.pythonhosted.org/packages/3a/3c/4fcf81a3ab8e75431b4938e13f83f69f774b567e41263e41e4abf71f9e4f/redshift_connector-2.0.405-py3-none-any.whl Collecting requests>=2.23.0<2.24.0 (from redshift_connector) Using cached https://files.pythonhosted.org/packages/45/1e/0c169c6a5381e241ba7404532c16a21d86ab872c9bed8bdcd4c423954103/requests-2.24.0-py2.py3-none-any.whl Collecting lxml>=4.2.5<4.6.0 (from redshift_connector) Downloading https://files.pythonhosted.org/packages/8c/7d/796d00718450fc687c045399429184ec8b91d62095926f490808cda321bd/lxml-4.6.1-cp38-cp38-manylinux1_x86_64.whl (5.4MB) |████████████████████████████████| 5.4MB 4.4MB/s Collecting scramp>=1.2.0<1.3.0 (from redshift_connector) Using cached https://files.pythonhosted.org/packages/0a/86/7ef1b93e8f453f297303e98869451e544588e8d76f2dd73ad17e8dabc5fc/scramp-1.2.0-py3-none-any.whl Collecting botocore>=1.19.8<1.20.0 (from redshift_connector) Using cached https://files.pythonhosted.org/packages/24/7f/9298904c3b9b6f6e49f1cda7e3c92c8b2e8639020e11091fdfa2a345b354/botocore-1.19.12-py2.py3-none-any.whl Collecting numpy>=1.15.4<1.20.0 (from redshift_connector) Downloading https://files.pythonhosted.org/packages/e5/7d/fe25dcdfc46d14e037cbb87e480ac067da36f56a8e65928bf1040ff35793/numpy-1.19.4-cp38-cp38-manylinux2010_x86_64.whl (14.5MB) |████████████████████████████████| 14.5MB 1.1MB/s Collecting pandas==0.25.3 (from redshift_connector) Downloading https://files.pythonhosted.org/packages/7b/fd/41698f20fd297cef2dc43a72a8ca42d149eaf7d954f1fb2bd3fc366a658d/pandas-0.25.3-cp38-cp38-manylinux1_x86_64.whl (10.4MB) |████████████████████████████████| 10.4MB 2.4MB/s Collecting wheel>=0.33 (from redshift_connector) Using cached https://files.pythonhosted.org/packages/a7/00/3df031b3ecd5444d572141321537080b40c1c25e1caa3d86cdd12e5e919c/wheel-0.35.1-py2.py3-none-any.whl Collecting BeautifulSoup4>=4.7.0<4.8.0 (from redshift_connector) Using cached https://files.pythonhosted.org/packages/d1/41/e6495bd7d3781cee623ce23ea6ac73282a373088fcd0ddc809a047b18eae/beautifulsoup4-4.9.3-py3-none-any.whl Collecting pytz>=2020.1<2020.2 (from redshift_connector) Using cached https://files.pythonhosted.org/packages/12/f8/ff09af6ff61a3efaad5f61ba5facdf17e7722c4393f7d8a66674d2dbd29f/pytz-2020.4-py2.py3-none-any.whl Collecting boto3>=1.16.8<1.17.0 (from redshift_connector) Using cached https://files.pythonhosted.org/packages/60/06/78b1aa1768cfbab8a333db53101b69b800cd357d4dec7625de872ba5b453/boto3-1.16.12-py2.py3-none-any.whl Collecting chardet<4,>=3.0.2 (from requests>=2.23.0<2.24.0->redshift_connector) Using cached https://files.pythonhosted.org/packages/bc/a9/01ffebfb562e4274b6487b4bb1ddec7ca55ec7510b22e4c51f14098443b8/chardet-3.0.4-py2.py3-none-any.whl Collecting idna<3,>=2.5 (from requests>=2.23.0<2.24.0->redshift_connector) Using cached https://files.pythonhosted.org/packages/a2/38/928ddce2273eaa564f6f50de919327bf3a00f091b5baba8dfa9460f3a8a8/idna-2.10-py2.py3-none-any.whl Collecting certifi>=2017.4.17 (from requests>=2.23.0<2.24.0->redshift_connector) Using cached https://files.pythonhosted.org/packages/5e/c4/6c4fe722df5343c33226f0b4e0bb042e4dc13483228b4718baf286f86d87/certifi-2020.6.20-py2.py3-none-any.whl Collecting urllib3!=1.25.0,!=1.25.1,<1.26,>=1.21.1 (from requests>=2.23.0<2.24.0->redshift_connector) Using cached https://files.pythonhosted.org/packages/56/aa/4ef5aa67a9a62505db124a5cb5262332d1d4153462eb8fd89c9fa41e5d92/urllib3-1.25.11-py2.py3-none-any.whl Collecting python-dateutil<3.0.0,>=2.1 (from botocore>=1.19.8<1.20.0->redshift_connector) Using cached https://files.pythonhosted.org/packages/d4/70/d60450c3dd48ef87586924207ae8907090de0b306af2bce5d134d78615cb/python_dateutil-2.8.1-py2.py3-none-any.whl Collecting jmespath<1.0.0,>=0.7.1 (from botocore>=1.19.8<1.20.0->redshift_connector) Using cached https://files.pythonhosted.org/packages/07/cb/5f001272b6faeb23c1c9e0acc04d48eaaf5c862c17709d20e3469c6e0139/jmespath-0.10.0-py2.py3-none-any.whl Collecting soupsieve>1.2; python_version >= "3.0" (from BeautifulSoup4>=4.7.0<4.8.0->redshift_connector) Using cached https://files.pythonhosted.org/packages/6f/8f/457f4a5390eeae1cc3aeab89deb7724c965be841ffca6cfca9197482e470/soupsieve-2.0.1-py3-none-any.whl Collecting s3transfer<0.4.0,>=0.3.0 (from boto3>=1.16.8<1.17.0->redshift_connector) Using cached https://files.pythonhosted.org/packages/69/79/e6afb3d8b0b4e96cefbdc690f741d7dd24547ff1f94240c997a26fa908d3/s3transfer-0.3.3-py2.py3-none-any.whl Collecting six>=1.5 (from python-dateutil<3.0.0,>=2.1->botocore>=1.19.8<1.20.0->redshift_connector) Using cached https://files.pythonhosted.org/packages/ee/ff/48bde5c0f013094d729fe4b0316ba2a24774b3ff1c52d924a8a4cb04078a/six-1.15.0-py2.py3-none-any.whl Installing collected packages: chardet, idna, certifi, urllib3, requests, lxml, scramp, six, python-dateutil, jmespath, botocore, numpy, pytz, pandas, wheel, soupsieve, Beautifu lSoup4, s3transfer, boto3, redshift-connector Successfully installed BeautifulSoup4-4.9.3 boto3-1.16.12 botocore-1.19.12 certifi-2020.6.20 chardet-3.0.4 idna-2.10 jmespath-0.10.0 lxml-4.6.1 numpy-1.19.4 pandas-0.25.3 python -dateutil-2.8.1 pytz-2020.4 redshift-connector-2.0.405 requests-2.24.0 s3transfer-0.3.3 scramp-1.2.0 six-1.15.0 soupsieve-2.0.1 urllib3-1.25.11 wheel-0.35.1 (rc-test) [tmp]$ pip freeze beautifulsoup4==4.9.3 boto3==1.16.12 botocore==1.19.12 certifi==2020.6.20 chardet==3.0.4 idna==2.10 jmespath==0.10.0 lxml==4.6.1 numpy==1.19.4 pandas==0.25.3 python-dateutil==2.8.1 pytz==2020.4 redshift-connector==2.0.405 requests==2.24.0 s3transfer==0.3.3 scramp==1.2.0 six==1.15.0 soupsieve==2.0.1 urllib3==1.25.11
pandas, numpyも併せてインストールされました。
使ってみる
実際にRedshiftに接続してみます。
TEMPテーブルの作成,データの挿入、クエリ結果の取得を試してみます。
import redshift_connector # APIレベルの確認 print(f"{redshift_connector.apilevel=}") # 接続情報 params = { "host": "xxxxxxxxxxxx.ap-northeast-1.redshift.amazonaws.com", "port": 5439, "database": "xxxxxxxxxx", "user": "xxxxxxxxxx", "password": "xxxxxxxxxx", } # コネクションの作成 with redshift_connector.connect(**params) as con: # カーソルの作成 with con.cursor() as cur: # テーブル作成 cur.execute("CREATE TEMP TABLE sample (id int, name VARCHAR(100))") # データ挿入 sample_data = ( (1, "a"), (2, "b"), (3, "c"), (4, "d"), (5, "e"), ) cur.executemany("INSERT INTO sample (id, name) VALUES (%s, %s)", sample_data) # クエリ(Database API形式) result = cur.execute("SELECT * FROM sample ORDER BY 1").fetchall() print("クエリ結果:\n", result) # クエリ(pandas DataFrame) result = cur.execute("SELECT * FROM sample ORDER BY 1").fetch_dataframe() print("クエリ結果(pandas):\n", result) # クエリ(numpy) result = cur.execute("SELECT * FROM sample ORDER BY 1").fetch_numpy_array() print("クエリ結果(numpy):\n", result)
実行してみます。
(rc-test) [tmp]$ python rc_sample.py redshift_connector.apilevel='2.0' クエリ結果: ([1, 'a'], [2, 'b'], [3, 'c'], [4, 'd'], [5, 'e']) クエリ結果(pandas): id name 0 1 a 1 2 b 2 3 c 3 4 d 4 5 e クエリ結果(numpy): [['1' 'a'] ['2' 'b'] ['3' 'c'] ['4' 'd'] ['5' 'e']]
正常に動作し、通常のクエリ形式、pandas、numpyそれぞれの形式でデータが取得できました。
まとめ
psycopg2はPure Pythonでなかったり、ライセンス的に使いづらいケースがありました。
「amazon-redshift-python-driver」を使うことで、より容易にRedshiftに接続できそうですね。